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Art Unit: 2772 

DETAILED ACTION 

Claim Rejections - 35 USC § 103 
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

1 . Claims 1-3,11, 12, 21 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Adelson (U.S. Patent 5,706,417) in view of Yeo et al. (U.S. Patent 5,821,945). 

Claim 1 lays claim to a method of representing video information comprising the steps of 
dividing a stream into scenes, each scene into frames including a key frame, and also dividing 
scenes into layers using intra-scene motion analysis, and storing content-related appearance 
attributes or mosaic representations in a database. 

Claim 2 adds to claim 1 the step of dividing the scene into a foreground layer and a 
background layer, and creating a 2-D mosaic representation of these. 

Adelson teaches that a layer exists for each object, set of objects, or portion of an object 
in the image having a motion vector significantly different from any other object in the image 
(Col.2: lines 45-47), thereby teaching the use of intra-scene motion analysis. He also teaches 
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combining the foreground and background images to produce a video image, thereby implicitly 
teaching mosaic representation (Col. 2: lines 15-21; Col. 6: lines 50-55). Adelson also teaches 
content related appearance attributes for each layer with the use of intensity map, attenuation 
map, velocity map, and delta map (Col. 2: lines 50-67), and implicitly teaches storing these 
attributes in a database. Adelson does not teach dividing a continuous video stream into video 
scenes, and scenes into frames including a key frame. Yeo discloses dividing the sequence into 
equal length segments, denoting the first frame of each segment as its key frame (Col.l : lines 34- 
38), and also teaches classifying a long video sequence into story units (Col.l: Unes 47-50). 
Hence it would be obvious to one skilled in the art at the time the invention was made to divide 
scenes into video frames with a key frame for each scene, as this will provide an effective means 
of browsing the video content. 



Claim 3 adds to claim 1 the steps of storing the scenes in a mass storage unit, and 
retrieving scenes associated with an attribute. 

Adelson teaches the use of video tape player, laser disc player as data source for image 
pixel data (Col.4: lines 2-7, 16-20). This implicitly teaches using mass storage unit to store data 
representing the scenes. Adelson also teaches having various maps for the various attributes 
(Col.2: lines 55-67; Col. 5: lines 9-17), and retrieving data easily to reconstruct an image, based 
on the required image (Col. 6: lines 30-47). 
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Claim 1 1 adds to claim 1 the steps of storing ancillary information related to layers or 

frames. 

Adelson teaches the use of optional maps, including a contrast change map and a blur 
map for each layer (Col.3: lines 6-14). 

Claim 1 2 is a method for generating a video information database, comprising the steps 
of segmenting a video stream, identifying a plurality of attributes, and storing them in a database. 

Claim 12 is a claim to a method that implements the method for representing video 
information described in claim 1, and is rejected with the same rationale. 

Claim 21 is a claim for a computer readable medium that implements the method as 
claimed in claim 1 and hence is rejected for the same reasons. 

2. Claim 4 is rejected under 35 U.S.C. 103(a) as being unpatentable over Adelson (U.S. 
Patent 5,706,417) in view of Yeo et al. (U.S. Patent 5,821,945), as applied to claim 3, and further 
in view of Burt et al. (U.S. patent 5,649,032). 

Claim 4 adds to claim 3 the limitation that the mosaic representation is one of a two 
dimensional, a three dimensional, and a network of mosaics. 

Burt teaches aligning and combining images or other mosaics to form a mosaic (Col.3: 
lines 38-48; Col. 5: lines 27-36). Hence it would be obvious to one skilled in the art at the time 



Application/Control Number: 08/970889 Page 5 

Art Unit: 2772 



the invention was made to combine various layers/images to generate a mosaic representation as 
this will provide the user greater flexibiUty in altering the image scene to suit their needs. 

3. Claims 5-8 are rejected under 35 U.S.C. 103(a) as being unpatentable over Adelson (U.S. 
Patent 5,706,417) in view of Yeo et al. (U.S. Patent 5,821,945), as applied to claim 1, and further 
in view of Burt et al. (U.S. patent 5,649,032). 

Claim 5 adds to claim 1 the steps of generating an image pyramid for a layer, filtering 
such that each subband is associated with feature maps, and integrating feature maps to produce 
attribute pyramid subbands, which comprise content-based appearance attribute subband 
associated with a corresponding image pyramid subband. 

Adelson discloses the use of subbands to encode images (Col. 1 : lines 20-24). Adelson 
also teaches the feature maps associated with each layer (Col.2: lines 55-67; Col. 5: lines 9-17), 
and integrating the feature maps to reconstruct an image (Col.6: lines 30-47). Adelson and Yeo 
fail to teach image pyramids. Burt teaches the use of image pyramid framework in the alignment 
process, and converting the input image and the mosaic into Laplacian image pyramids, and 
applying the alignment to all levels within the respective pyramids. Hence it would be obvious to 
one skilled in the art at the time the invention was made to use the image pyramid in each layer in 
order to achieve better alignment and reproduction of the image. 
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Claim 6 adds to claim 5 the limitation that the attribute comprises at least one of 
luminance, chrominance, and texture. 

Adelson discloses the use of intensity map, depth map, blur map, contract change map 
(Col.2: lines 55-67; CoL5: lines 9-17). 

Claim 7 adds to claim 5 the step of rectifying the feature maps associated with each 
subband. 

Adelson discloses the use of delta map, which is essentially an additive error map, which 
provides correction data for any changes in the image over time which can not be accounted for 
by the other maps. 

Claim 8 adds to claim 5 the step of collapsing the attribute pyramid subbands to produce 
a content-based appearance attribute. 

Yeo teaches that the lower levels of the hierarchy can be based on visual cues, while the 
upper levels allow criteria that reflect semantic information associated (Col. 5: lines 48-52), the 
nodes capturing the contents of a video, while the edges capture its structure. Yeo also teaches a 
tree hierarchy that permits the user to have a coarse-to-fme view of the entire video sequences 
based on the level of the nodes (Col.4: lines 30-35), the nodes capturing the core contents of the 
video while the edges capture its structure (Col. 5: lines 40-43). Hence it would be obvious to one 
skilled in the art at the time the invention was made to collapse the attribute pyramid subbands to 
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produce a content-based appearance attribute since this will offer a browsing structure that 
closely resembles human perception and understanding. 

4. Claims 9-10 are rejected under 35 U.S.C. 103(a) as being unpatentable over Adelson 
(U.S. Patent 5,706,417) in view of Yeo et al. (U.S. Patent 5,821,945), as applied to claim 1, and 
further in view of Barber et al. (U.S. Patent 5,751,286). 

Claim 9 adds to claim 1 the steps of receiving a request matching a desired content- 
related appearance attribute, and retrieving at least one layer matching the request. 

Adelson teaches a method of retrieving data representing layers, each layer comprising a 
series of maps, to reconstruct an image. Barber teaches a method of building a visual query by 
image content, and retrieving database images with features that correspond to the selected image 
characteristics (Col.2: line 64 - Col.3: line 8). Hence it would be obvious to one skilled in the art 
at the time the invention was made to query the database by content-related appearance attribute, 
and retrieve layers that match the attribute, in order to reconstruct the image as desired, as such 
an approach will save database storage requirements. 

Claim 10 adds to claim 9 the steps of identifying a query type as being one of luminance, 
chrominance, and texture type, and a query specification as being a desired property of the query 
type, and selecting a filter type and calculating the appearance attribute based on filter type and 
desired property. 
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Barber discloses a query construction interface with a hierarchical selection windows for 
each of image color, shapes, textures, category, which may include keywords, text or conditions 
(Col. 3: lines 22-34). Barber also teaches filtering the masks in the current image by the category 
code, establishing the set of masks that will be analyzed with respect to the image characteristic 
values (Col. 12: lines 1-5). Barber also teaches computing positional feature score that compares 
the area's similarity to the image areas (Col. 14: lines 40-60). Hence it would be obvious to one 
skilled in the art at the time the invention was made to use a query type to chose the parameter, 
and specification to specify a desired property for the parameter, as this would facilitate 
retrieving only the layers that match the selection criteria, and hence would increase the speed of 
rendering the image. 

5. Claims 13-14 are rejected under 35 U.S.C. 103(a) as being unpatentable over Adelson 
(U.S. Patent 5,706,417) in view of Yeo et al. (U.S. Patent 5,821,945), as applied to claim 12, and 
fiirther in view of Zhang et al. (U.S. Patent 5,635,982). 

Claim 13 adds to claim 12 the steps of generating a descriptor vector, and generating a 
scene cut indicium in response to calculated differences between descriptive vectors of 
successive frames exceeding a threshold. 

Adelson teaches generating an intensity map, an attenuation map, a velocity map, and a 
delta map for each layer. Zhang teaches calculating the differences between consecutive video 
frames based on the selected difference metric, and defining a cut if the values exceed a threshold 
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value (Col.7: lines 1-10; Col.8: lines 5-15). Hence it would be obvious to one skilled in the art at 
the time the invention was made to generate a scene cut if the calculated differences between 
descriptive vectors exceeded a threshold value, as this would minimize the calculation needed to 
detect scene cuts. 



Claim 14 adds to claim 12 the steps of generating a descriptor vector and a threshold for 
it in the first pass, and calculating the difference between the frames and generating a scene cut 
indicium in the second pass, if the difference exceeds the threshold value. 

Zhang teaches a multi-pass approach, wherein the prospective segment boundaries are 
determined in the first pass, by comparing against a threshold value. This implies the use of a 
descriptor vector to define a frame, such that they can be compared against a threshold value. 
Zhang teaches using the second pass to locate all boundaries (scene cuts). Zhang also teaches 
using the multi-pass approach to apply different difference metrics in different passes (Col.6: 
lines 20-64), and teaches defining cuts based on the differences in the difference metrics (Col.8: 
lines 5-15). Hence it would be obvious to one skilled in the art at the time the invention was 
made to use two passes as described in this claim to compute the attribute value, as this would 
provide more accurate values for the attribute. 
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6. Claim 15 is rejected under 35 U.S.C. 103(a) as being unpatentable over Adelson (U.S. 
Patent 5,706,417) in view of Yeo et al. (U.S. Patent 5,821,945), as applied to claim 12, and 
further in view of Burt et al. (U.S. Patent 5,649,032). 

Claim 15 adds to claim 12 the steps of identifying a key frame with in each segment, and 
representing the scenes as 2-D mosaics, 3-D mosaics, or 3-D structures. 

Yeo teaches identifying a key frame in each segment (Col.l: lines 60-65; Col,2: lines 40- 
45). Burt teaches a mosaic construction system that combines input images to form a mosaic 
(Col.4: lines 26-36). Hence it would be obvious to one skilled in the art at the time the invention 
was made to use a key frame to retrieve the desired video segments, and represent the scenes in a 
mosaic representation, as this will reduce index storage requirements, and also fetch the related 
video frames faster. 

7. Claim 16 is rejected under 35 U.S.C. 103(a) as being unpatentable over Adelson (U.S. 
Patent 5,706,417) in view of Yeo et al. (U.S. Patent 5,821,945), as applied to claim 12, and 
further in view of Barber et al. (U.S. Patent 5,751,286). 

Claim 16 adds to claim 12 the steps of filtering at least one frame of each scene with at 
least one filter of a pre-determined attribute, and at least one frame of each scene with a at least 
one filter of a second pre-determined attribute. 

Barber teaches a method of filtering the masks in the current image with the category 
code in the data structures, for each of the set of data structures corresponding to a query. Since 
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the categories are processed sequentially, it is implied that this process involves multiple passes 
through the frames, once for each category. Hence it would be obvious to one skilled in the art at 
the time the invention was made to use multiple passes, one for each selected filter attribute, as 
this would help produce as seamless a mosaic as possible. 

8. Claims 17-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over Barber et al. 
(U.S. Patent 5,751,286) in view of Yeo et al. (U.S. Patent 5,821,945). 

Claim 1 7 claims a method for browsing a video program, comprising the steps of 
providing a database comprising attribute information, formulating a query utilizing the attribute 
information, and searching and retrieving video frames that substantially match the query 
criterion. 

Barber teaches a query facility which builds a visual query by image content, and also 
teaches a query engine that interprets the query, and returns database images with features that 
correspond to the selected criteria (CoL2: line 64 - Col. 3: line 8). Barber does not teach the 
notion of a representative video frame for a video scene. Yeo discloses a method for content- 
based video browsing, containing a video database, and sets of key frames that have associated 
attributes, the key frames representing the long sequence of related shots (Col.2: lines 35-45). 
Yeo also teaches the use of Rframes (representative frames) to organize the visual contents of the 
video clips (Col.l : lines 30-65). Hence it would be obvious to one skilled in the art at the time 
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the invention was made to build a query to retrieve the representative frames, as this would be a 
faster way to identify areas of interest before retrieving all the related frames. 

Claim 18 adds to claim 17 the steps of selecting a query type, query specification, and 
computing a multi-dimensional feature vector. 

Barber teaches query specification for image characteristics (query type) (Col. 13: lines 
44-53). Barber also teaches calculating a positional feature score combining features and 
positional similarity for each of the areas selected in the query (Col. 15: lines 40-61). 

Claim 19 adds to claim 18 the limitation of selecting a query specification by identifying 
a portion of the displayed image, and the feature vector is calculated based on query type and the 
identified image portion. 

Barber teaches specification in a query of image characteristics that occur in some area or 
areas of the image (Col. 13: lines 45-52). 

Claim 20 adds to claim 19 the steps of formatting and transmitting the identified video 

frames. 

Barber teaches returning the images with the best scores in response to a query (Col. 14: 
lines 65-67). 
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Conclusion 

Any response to this correspondence should be mailed to: 
Box AF 

Commissioner of Patents and Trademarks 
Washington, D.C. 20231 

or faxed to: 

(703) 305-9051, (for formal communications; please mark "EXPEDITED 
PROCEDURE") 

Or: 

(703) 308-6606 (for informal or draft communications, please label 
"PROPOSED" or "DRAFT") 

Hand-delivered responses should be brought to Crystal Park II, 2021 Crystal 
Drive, Arlington. VA., Sixth Floor (Receptionist). 
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Any inquiry concerning this communication or earlier communications from the examiner 
should be directed to Mano Padmanabhan whose telephone number is (703) 306-2903. She can 
normally be reached Monday-Thursday from 6:30am-5:00pm. 

If attempts to reach the examiner are unsuccessfiil, the examiner's supervisor, Mark Powell, 
can be reached on (703) 305-9703. 

Any inquiry of a general nature or relating to the status of this application should be directed 
to the Group receptionist whose telephone number is (703) 305-3900. 





Mano Padmanabhan 



PRiMARYEXAMii^ER 



July 7, 1999 



